SoundCloud Analyis

Author

Hajara Muzammal

Introduction:

Music streaming platforms shape how listeners discover and consume music. While most large-scale analyses focus on Spotify or Billboard charts, SoundCloud plays a unique role by amplifying emerging artists, remixes, and niche genres.

In this project, we analyze a dataset of playlist tracks that includes direct SoundCloud links, playlist metadata, and audio features. Our goal is to understand:

What musical characteristics are associated with popular songs?

How playlist inclusion relates to popularity

Trends in danceability, tempo, key, and track length

By combining playlist-level data with song-level attributes, we explore patterns that define what makes songs more likely to appear in curated playlists.

Data Ingest

We use a publicly available dataset hosted on Hugging Face, which contains playlist metadata, song characteristics, and direct SoundCloud links.

Show code
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)

url <- "https://huggingface.co/datasets/Zuru7/Spotify_Songs_with_SoundCloud_links/resolve/main/song_df_normalised.csv"

SONGS <- read_csv(url, show_col_types = FALSE)
glimpse(SONGS)
Rows: 14,987
Columns: 23
$ track_name        <chr> "i feel alive", "poison", "baby it's cold outside (f…
$ track_artist      <chr> "steady rollin", "bell biv devoe", "ceelo green", "k…
$ lyrics            <chr> "the trees, are singing in the wind the sky blue, on…
$ track_album_name  <chr> "love & loss", "gold", "ceelo's magic moment", "kard…
$ track_popularity  <dbl> 28, 0, 41, 65, 70, 52, 36, 42, 1, 58, 69, 72, 74, 41…
$ playlist_name     <chr> "hard rock workout", "back in the day - r&b, new jac…
$ playlist_genre    <chr> "rock", "r&b", "r&b", "pop", "r&b", "r&b", "r&b", "e…
$ playlist_subgenre <chr> "hard rock", "new jack swing", "neo soul", "dance po…
$ danceability      <dbl> 0.2166860, 0.8447277, 0.3580533, 0.7462341, 0.440324…
$ energy            <dbl> 0.8779620, 0.6460897, 0.3674362, 0.8850809, 0.632868…
$ key               <dbl> 0.81818182, 0.54545455, 0.45454545, 0.81818182, 0.54…
$ loudness          <dbl> 0.7817377, 0.6813893, 0.7425419, 0.8813965, 0.730275…
$ mode              <dbl> 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1…
$ speechiness       <dbl> 0.02434122, 0.21616793, 0.01306387, 0.02065654, 0.03…
$ acousticness      <dbl> 0.011792960, 0.004353434, 0.694556021, 0.037297028, …
$ instrumentalness  <dbl> 0.010205339, 0.007422998, 0.000000000, 0.000000000, …
$ liveness          <dbl> 0.34221195, 0.48613476, 0.05781237, 0.13038190, 0.08…
$ valence           <dbl> 0.4080748, 0.6565622, 0.4090849, 0.2424166, 0.308073…
$ tempo             <dbl> 0.5545093, 0.4227024, 0.4605076, 0.5250801, 0.625378…
$ language          <chr> "en", "en", "en", "en", "en", "en", "en", "en", "es"…
$ sentiment         <chr> "Positive", "Positive", "Positive", "Negative", "Pos…
$ song_artist       <chr> "i feel alive steady rollin", "poison bell biv devoe…
$ links             <chr> "http://soundcloud.com/xobak3r/purple-vision-ft-xoro…

Rows: 14,987 Columns: 23 Includes audio features (danceability, energy, valence, tempo), playlist info, popularity, and SoundCloud links.

Data Cleaning

Rename columns for consistency

Show code
SONGS <- SONGS %>%
  rename(
    track        = track_name,
    artist       = track_artist,
    album        = track_album_name,
    popularity   = track_popularity,
    genre        = playlist_genre,
    subgenre     = playlist_subgenre,
    soundcloud   = links
  )

Remove missing or invalid rows

Show code
SONGS <- SONGS %>%
  filter(
    !is.na(track),
    !is.na(artist),
    !is.na(popularity)
  )

Data Integration

Show code
PLAYLIST_TABLE <- SONGS %>%
  mutate(
    playlist_id = as.integer(as.factor(playlist_name)),
    track_id    = as.integer(as.factor(paste(track, artist))),
    artist_id   = as.integer(as.factor(artist)),
    album_id    = as.integer(as.factor(album))
  ) %>%
  select(
    playlist_name,
    playlist_id,
    artist,
    artist_id,
    track,
    track_id,
    album,
    album_id,
    popularity,
    danceability,
    energy,
    valence,
    tempo,
    soundcloud
  )

Data Exploration

W define a “popular song” as one with a popularity that is greater than or equal to 70.

Show code
pop_threshold <- 70
print(pop_threshold)
[1] 70

Popularity vs Playlist Appearances

Show code
track_counts <- PLAYLIST_TABLE %>%
  count(track, popularity)

ggplot(track_counts, aes(popularity, n)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Popularity vs Playlist Appearances",
    x = "Track Popularity",
    y = "Number of Playlist Appearances"
  ) +
  theme_minimal()

Show code
print(track_counts)
# A tibble: 14,944 × 3
   track                                              popularity     n
   <chr>                                                   <dbl> <int>
 1 $20 fine                                                   44     1
 2 $ave dat money (feat. fetty wap & rich homie quan)         69     1
 3 $dreams                                                    43     1
 4 '39 - 2011 mix                                             60     1
 5 '98 freestyle                                               0     1
 6 'til you do me right                                       39     1
 7 'till i collapse                                           83     1
 8 ...baby one more time                                      75     1
 9 ...ready for it? - bloodpop® remix                         50     1
10 ...til the cops come knockin'                              48     1
# ℹ 14,934 more rows

This plot shows a weak relationship between track popularity and the number of playlist appearances, with most tracks appearing in only one playlist regardless of popularity score. While a small number of moderately to highly popular tracks appear in multiple playlists, overall playlist inclusion does not strongly increase with popularity in this dataset.

Most danceable songs

Show code
SONGS %>%
  arrange(desc(danceability)) %>%
  select(track, artist, danceability, soundcloud) %>%
  head(5)
# A tibble: 5 × 4
  track                                           artist danceability soundcloud
  <chr>                                           <chr>         <dbl> <chr>     
1 ice ice baby                                    vanil…        1     http://so…
2 cha cha slide - original live platinum band mix dj ca…        0.999 http://so…
3 funky friday                                    dave          0.995 http://so…
4 bad bad bad (feat. lil baby)                    young…        0.994 http://so…
5 cinnamon girl - radio edit                      [dunk…        0.994 http://so…

The number one danceable track is ice ice baby. ## Danceability vs Popularity

Show code
ggplot(SONGS, aes(danceability, popularity)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Danceability vs Popularity",
    x = "Danceability",
    y = "Popularity"
  ) +
  theme_minimal()

This plot shows a weak but positive relationship between danceability and track popularity, indicating that more danceable songs tend to be slightly more popular on average. However, the wide dispersion of points suggests that danceability alone is not a strong predictor of popularity, and highly popular songs exist across a broad range of danceability values

Tempo vs Popularity

Show code
ggplot(SONGS, aes(tempo, popularity)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "loess", se = FALSE) +
  labs(
    title = "Tempo vs Popularity",
    x = "Tempo",
    y = "Popularity"
  ) +
  theme_minimal()

The relationship between tempo and popularity appears weak and non-linear, with popularity remaining relatively stable across most tempo values. This suggests that tempo alone does not strongly influence a song’s popularity on playlists.

Conclusion:

Overall, the analysis suggests that while playlist exposure and audio features are related to popularity, no single characteristic fully explains why a song becomes popular on SoundCloud-linked Spotify playlists. Playlist appearances show only a weak relationship with track popularity, indicating that many popular tracks appear in relatively few playlists, while less popular tracks can still circulate widely. Danceability and tempo exhibit mild positive associations with popularity, implying that more rhythmically engaging songs tend to perform slightly better, though the effect is not strong. Popular songs are also concentrated at moderate-to-high energy levels and generally exhibit balanced valence, suggesting that listeners gravitate toward songs that are energetic but emotionally neutral to positive rather than extremely sad or euphoric. Taken together, these findings highlight that popularity is multifaceted: audio features contribute to success, but playlist dynamics, listener behavior, and external factors likely play an equally important role in shaping which songs gain widespread attention.